Automated Arabic Text Categorization Using SVM and NB

نویسنده

  • Saleh Alsaleem
چکیده

Text classification is a supervised learning technique that uses labeled training data to derive a classification system (classifier) and then automatically classifies unlabelled text data using the derived classifier. In this paper, we investigate Naïve Bayesian method (NB) and Support Vector Machine algorithm (SVM) on different Arabic data sets. The bases of our comparison are the most popular text evaluation measures. The Experimental results against different Arabic text categorization data sets reveal that SVM algorithm outperforms the NB with regards to all measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on text categorization of Indian and non-Indian languages using supervised learning techniques

Categorization of text plays an important role in the text mining field. Text categorization is the process in which documents are categorized into its predefined category. Automatic text categorization is an important task due to large amount of electronic documents. This paper presents a survey of Text categorization of Indian and non-Indian languages. There is very less work done in text cat...

متن کامل

An automated arabic text categorization based on the frequency ratio accumulation

Compared to other languages, there is still a limited body of research which has been conducted for the automated Arabic Text Categorization (TC) due to the complex and rich nature of the Arabic language. Most of such research includes supervised Machine Learning (ML) approaches such as Naïve Bayes (NB), K-Nearest Neighbour (KNN), Support Vector Machine and Decision Tree. Most of these techniqu...

متن کامل

The Effect of Preprocessing on Arabic Document Categorization

Preprocessing is one of the main components in a conventional document categorization (DC) framework. This paper aims to highlight the effect of preprocessing tasks on the efficiency of the Arabic DC system. In this study, three classification techniques are used, namely, naive Bayes (NB), k-nearest neighbor (KNN), and support vector machine (SVM). Experimental analysis on Arabic datasets revea...

متن کامل

Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment

The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting know...

متن کامل

A Comparative Study on Feature Weight in Thai Document Categorization Framework

Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is one of important preprocessing techniques in text categorization. This paper is a comparative study of feature weighting methods in statistical learning of Thai Document Categorization Framework. Six methods were e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. Arab J. e-Technol.

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011